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ABSTRACT 

Consider  the  linear  regression  model  Y  =  X0  +  e,  where  0 
is  an  unknown  parameter  vector  to  be  estimated.  A  class 
of  estimators,  variously  known  as  the  ridge  estimators,  is 
given  by  0  =  (X* X  +  KI)  X'Y,  where  K  is  a  constant  or  a  func¬ 
tion  of  Y.  The  ridge  estimator  is  a  suitable  alternative  to 
the  least  squares  estimator  when  the  design  matrix  X' X  is 
nearly  singular.  A  number  of  papers  has  appeared  in  the  sta¬ 
tistical  literature  in  tho  recent  years,  giving  empirical 
evaluation  of  various  ridge  estimators.  This  paper  gives 
a  theoretical  discussion  of  some  properties  of  the  ridge  es¬ 
timators  . 
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INTRODUCTION 

Consider  the  linear  regression  model 

Y  =  X©  +  e  (1.1) 

where  Y  is  n  x  1  vector  of  observations,  X  is  n  x  p  design 
matrix  of  rank  p,  0  is  p  x  1  vector  of  unknown  parameters 
and  e  is  n  x  1  vector  of  observational  errors.  Let  the  com¬ 
ponents  of  e  be  uncorrelated  and  have  zero  mean  and  a  com- 

2 

mon  variance  equal  to  a  ,  say.  The  usual  estimator  of  0  is 
derived  by  the  least  squares  method,  that  is,  by  minimizing 
(Y-X©) ' (Y-X0)  with  respect  to  0,  and  is  given  by 

0  =  (X'X) -1X' Y  (1.2) 

A 

where  prime  denotes  the  transpose  of  a  matrix.  Clearly,  0 


is  an  unbiased  estimator  of  0.  Let  denote  the 

characteristic  roots  of  X'X.  The  mean  squared  error  (MSE) 

A 

of  0  is  given  by 

MSE0  =  E  (0-0)  '  (0-0) 

=  CT2 1? 


i=l  A.' 

l 


(1.3) 


In  application  of  multiple  linear  regression,  the  design 
matrix  X'X  is  often  nearly  singular.  This  is  due  to  some 
interrelation  between  the  explanatory  variables.  The  rela¬ 
tion  is  technically  called  multicollinearity .  The  least 
squares  estimator  of  the  regression  coefficients  tends  to 
become  "unstable"  in  the  presence  of  multicollinearity. 

More  precisely,  the  variance  of  the  estimates  of  some  of 
the  regression  coefficients  becomes  large.  This  is  shown 
by  (1.3).  For  this  case  Hoerl  (1362)  and  Hoerl  and  Kennard 


(2) 


(1970,  a),  (1970,  b)  suggested  a  class  of  estimators  known 
as  ridge  estimators  as  an  alternative  to  the  least  squares 
estimator.  The  ridge  estimator  is  given  by 

0  =  (X'X  +  KI)-1x'Y  (1.4) 

where  I  denotes  an  identity  matrix  and  K  is  a  positive  num¬ 
ber  or  a  suitable  function  of  Y.  Clearly,  0  is  a  biased  es¬ 
timator  of  0.  The  new  method  of  estimation  is  called  ridge 
estimation. 

Let  P  be  an  orthogonal  matrix,  diagonalizing  X'X,  that  is 

PX'X  P'=  D  (1.5) 


where  D  is  a  diagonal  matrix  with  the  ith  diagonal  element 

equal  to  %  . .  Let  a  =  (a a  )'  =  P0 .  If  K  is  a  constant, 
1  x  p 

the  mean  squared  error  of  0  is  given  by 


MSEQ  -E (0-0) ' (0-0)  2 

—  i  i  i 

~  Zj=l  ( iV+K)  2  +  K  ZTT7+i O2 


(1-6) 


Comparing  (1.3)  with  (1.6)  we  observe  that  the  effect  of 
multicollinearity  of  the  explanatory  variables  in  the  design 
matrix  on  the  mean  squared  error  is  suitably  reduced  by  the 
ridge  estimation. 

Applied  statisticians  have  shown  considerable  interest 
in  ridge  estimation.  Papers  by  Farebrother  (1975) ,  Hawkins 
(1975),  Hemmerle  (1975),  Hoerl,  Kennard  and  Baldwin  (1975), 
McDonald  (1975),  McDonald  and  Galarneau  (1975),  Newhouse  and 
Oman  (1971)  and  Sidik  (1975)  may  be  cited  for  reference. 

Most  of  these  papers  deal  with  the  empirical  evaluation,  based 
on  simulation  study,  of  various  ridge  estimators  and  its  com¬ 
parison  with  the  least  squares  estimator  and  other  biased  es- 


k 


**  ] 

(3) 

timators.  Since  a  large  number  of  variables  is  involved 
in  the  regression  problem,  the  given  empirical  results  do 
not  give  sufficient  insight  into  the  operating  character¬ 
istics  of  the  ridge  estimators.  This  paper  gives  a  theo¬ 
retical  discussion  of  an  expository  nature  of  the  ridge 
estimation.  Among  other  results  it  is  shown  that  for  a 
certain  choice  of  K,  depending  on  Y,  the  ridge  estimator 
has  uniformly  smaller  mean  squared  error  than  the  least  squares 
estimator,  if  a  number  of  characteristic  roots  of  the  design 
matrix  is  sufficiently  small. 

A  generalized  ridge  estimator  is  given  by 

0  =  <X'X  +  K  )_1X'Y  (1.7) 

o  o 

where  Ko  is  a  diagonal  matrix.  In  this  paper  we  consider  only 
the  ordinary  ridge  estimator,  given  by  (1.4). 
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RIDGE  ESTIMATION 

The  main  results  of  the  paper  are  given  by  the  following 
theorems.  First  we  give  a  derivation  of  the  ridge  estimator 
based  on  the  least  squares  principle.  A  slightly  different 
derivation  based  on  the  same  principle  was  given  by  Hoerl  and 
Kennard  (1970  a).  Let  c  be  a  positive  number,  and  let 
R ( 0)  *  (y-X0)  '  (Y-X0)  . 

Theorem  2.1.  The  value  of  0  minimizing  R(0) ,  given 
0'0  <_  c,  is  equal  to  0,  where  K  is  chosen  such  that  0'0  =  c. 

Proof:  From  (1.4)  and  (1.5)  we  have 

0 '  0  —  (PX’Y)'  (D  +  KI)~2 (PX'Y) .  (2.1) 

From  (2.1)  it  is  seen  that  ©'0  is  decreasing  in  K.  Therefore, 
the  value  of  K,  given  by  0'0  =  c,  is  uniquely  determined. 

We  have 

R  ( 0)  =  (Y-X0)  '  (Y-X0) 

*  (Y-X0) ' (Y-X0)  +  (X'Y) ' [ (X'X  +  KI)-1- (X'X)-1] 
X'X[(X’X  +  KI)-1- (X’X)-1]X,Y 
=  (Y-X0) ' (Y-X0)  +  (PX'Y) 'D* (PX'Y)  (2.2) 

where  D*  is  a  pxp  diagonal  matrix  whose  ith  diagonal  element 
is  equal  to  - 

K 

Xi(K+Xi)2 

It  is  seen  from  (2.2)  that  R(0)  is  increasing  in  K. 

Now,  consider  the  problem  of  minimizing  R ( 0)  with  respect 
to  ©  under  the  constraint  O'  0  =  c.  By  the  Lagrangian  method 
the  minimizing  value  of  0  is  given  by 
\0-X'  (Y-X 3)  =  o 
or 

■  =  (X'X  +  aI)-1X'Y 


(5) 


where  A  is  determined  such  that  0'0  =  c.  Thus  R(0)  is  mini¬ 
mized  for  0=0,  where  K  is  determined  such  that  0'0  =  c. 

We  have  shown  above  that  R(0)  is  increasing  in  K  and 
that  0'0  is  decreasing  in  K.  It  follows  that  0  which  is  the 
minimizing  value  of  R(0) ,  given  0' 0  =  c,  is  also  the  mini¬ 
mizing  value  of  R ( 0) ,  given  ©'0  £  c,  where  K  is  determined 
from  0'  0  =  c.  0 

Remark  1 .  The  above  theorem  gives  an  interesting  com¬ 
parison  between  the  derivation  of  the  least  squares  estima¬ 
tor  and  the  ridge  estimator.  The  ridge  estimator  is  derived 
by  minimizing  R(0)  under  a  certain  constraint  on  the  value 
of  0' 0,  whereas  the  least  squares  estimator  is  derived  by 
minimizing  R(0)  without  that  constraint. 

The  next  theorem  gives  another  derivation  of  the  ridge 
estimator  from  a  Bayesian  approach,  assuming  that  the  prior 
distribution  of  0  and  the  conditional  distribution  of  Y  given 
0,  are  both  normal.  The  proof  of  the  theorem  is  trivial. 

This  result  is  also  noted  by  Lindley  and  Smith  (1972) .  The 
result  implies  that  the  ridge  estimator  for  a  constant  value 
of  K  is  a  Bayes  estimator  and  admissible  under  squared  error 
loss.  The  notation  Y§  M(y,  I)  means  that  Y  has  a  (multivariate) 
normal  distribution  with  mean  vector  y  and  covariance  matrix  I. 

Theorem  2.2.  If  Y^  N(x0,a^I)  conditionally  given  0,  and 
2 

a' priori  0^  N(0, t  I)  then  the  posterior  mean  of  0  given  Y,  is 

-  2  2 

equal  to  0  for  K  =  a  /t  . 

It  is  natural  to  compare  the  mean  squared  error  of  the 
ridge  estimator  and  the  least  squares  estimator.  First  we 
consider  the  case  when  K  is  a  constant.  It  is  clear  from 


(6) 


(1.3)  and  (1.6)  that  for  any  K  >  o 
MSE0  >  MSE©* 

for  sufficiently  large  value  of  0'©.  On  the  other  hand,  if 
it  is  known  a'  priori  that  0'0  _<  c  for  some  positive  number  c, 
a  valid  condition  in  many  practical  situations,  then  from  (1.6) 


we  get 


mses  <  o2^  -fxprV  *  «2xP=1  -jjjijj,; 


(2.3) 


Theorems  2.3  and  2.4  below,  give  values  of  K  obtained  from 

(2.3),  for  which  the  ridge  estimator  has  smaller  mean  squared 
error  than  the  least  squares  estimator. 

2  2 

Theorem  2.3.  If  Q'0  _<  c  then  MSE0  <  MSE 0  for  o  <  K  <  - 

2a2 

Proof:  From  (2.3)  we  have  for  o  <  K  <  - 

—  c 

MSE  0  <  02Z?  .  ( - - - »  +  - — - -) 

1  X  ( A  H-K)  ^  U±+K)2 

0  A.+2K 

-  °  ~ — -  ■ 

1-1  (\L+ K)Z 

2  p  1 

<  a  Zi=l 
=  MSE  0. 0 

2 

Theorem  2.4.  If  0'0  <  —  £?  ,  then  MSE0  <  MSE0  for  K^o. 
-  P  ^ 

Proof:  Let  D(K)  denote  the  quantity  on  the  right  hand 

side  of  (2.3).  Differentiating  D(K)  with  respect  to  K  we  get 

2  A .  (cK-s2) 

3DOO/3K  =  r?  — 1 - T (2.4) 

1-1  (A^K; 

2 

The  right  hand  side  of  (2.4)  is  equal  to  zero  for  K  =  a  /c 

2 

and  is  <{>)o  for  K  <  (>)a/c.  Hence,  D(K)  is  first  decreasing 
then  increasing  as  K  varies  from  o  to  «.  Now,  D(=0  =  ?c  and 


r""3 1 


O  Q 


(7) 


2  P  1 

D  (o)  =  a  £ .  ,  -~- 
i-l  kL 


Therefore 


=  MSEQ. 

MSE 0  £  D (K) 

_<  max  (pc,  MSEQ) 
=  MSEQ 


for  C  <  ~  lP  ,  — r— 
-  p  1=1 


.  0 


An  expression  for  K  minimizing  MSEQ,  given  by  (1.6),  is 
not  obtainable  in  a  closed  form.  But  for  a  given  value  of  Q'Q, 

it  is  seen  from  (1.6)  that  MSEQ  is  minimized  (maximized)  by 

2  • 
setting  ct^  =  Q'Q  for  the  value  of  i  corresponding  to  the  largest 

(smallest)  characteristic  root  of  the  matrix  X'X.  That  is, 

MSEQ  is  minimized  (maximized)  for  the  value  of  6  proportional 

to  the  characteristic  vector  of  X'X  corresponding  to  the  largest 

(smallest)  characteristic  root  of  the  matrix.  This  result  was 

also  noted  by  Newhouse  and  Oman  (1971).  Let  X*  =  min  (  X^, .  .  .  ,  \p ) 


and 


X  . 

i 


Qc(K)  a  (X  +K)2 


cK 


( X*+K) 2 


The  following  theorem  follows  from  (1.6). 

Theorem  2.5.  A  value  of  K  minimizing  QC(K)  is  minimax 
for  MSEQ,  given  Q'Q  £  c. 

Now  we  consider  the  case  when  K  depends  on  Y.  In  this 
case,  the  main  question  is  what  is  a  suitable  choice  of  K  as 
a  function  of  Y?  Some  of  the  authors  cited  above,  have  con¬ 
sidered  various  choices  and  have  compared  the  corresponding 
estimators  with  other  estimators.  Their  comparison  is  mainly 
based  on  simulation  study  which  leaves  many  questions  unanswered. 
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In  particular,  it  is  not  known  whether  the  ridge  estimator 
for  any  of  those  choices  of  K  has  smaller  mean  squared  error 
than  the  least  squares  estimator  for  all  values  of  0.  We 
consider  the  choice  of  K,  given  by 

K*  =  vo2/(0'0)  (2.5) 

where  v  is  a  positive  number  and 

a2  =  Y" (I-X(X'X)~1XI ) Y  . 
n-p 

*  ~  2 
The  given  choice  of  K  is  suggested  by  Theorem  2.2,  since  o 

is  an  unbiased  estimate  of  a  and  (0'0)/p  is  an  estimate  of 

r2.  Let 

0*  =  (X ' X+K*I ) _1X*  Y  (2.6) 


denote  the  corresponding  ridge  estimator.  Assuming  that 

N(X0^r2I),  we  shall  compute  MSE0*  and  compare  it  with  MSE0 . 
The  normality  assumption  will  be  made  tacitly  throughout  the 

A  O 

following  discussion.  Under  the  normality  assumption,  a  and 
O  are  independently  distributed. 

It  is  in  order  at  this  point  to  consider  briefly  the 
question  of  the  inadmissibility  of  the  least  squares  estimator 
with  respect  to  a  certain  class  of  biased  estimators.  Since 
Stein  (1955)  showed  that  the  mean  of  a  p-variate  normal  dis¬ 
tribution  is  inadmissible  for  p  >  3,  a  large  number  of  papers 
has  been  written  on  the  subject.  Alam  (1973,  1975),  3aranchik 
(1973),  Berger  (1976),  Bhattacharya  (1966),  Bock  (1975)  and 
Sclove  (1968) ,  to  name  only  a  few,  have  considered  certain 
class  of  estimators  of  the  mean  of  the  distribution,  which 
dominate  the  least  squares  estimator.  From  Theorem  5  of  Bock 


(1975)  it  follows  that  an  estimator  of  the  form 


f  (Y'X  (X'X)~1X,Y) » 


(2.7) 


(9) 


has  a  smaller  MSE  than  0  for  all  values  of  0,  where  S  is  a 

2 

random  variable  independent  of  X'Y,  such  that  (S/a  )  has  a 
chi-square  distribution  with  m  degrees  of  freedom, 
f:  [o,  <*I  -*[o,  1]  ,  y(l-f(y))  is  nondecreasing  in  y,  ofy(l-f(y))^ 
(2a-4)/(m+2)  and 

2<a  =  (I  P  — -)  X*-  (2.8) 

A 

The  random  variable  S  is  given,  for  example,  by 
S  =  Y’  (I-X(X,X)~1X’ ) Y 


where  m  =  n-p.  By  Theorem  6  of  Bock,  the  inequality  (2.8)  is 
also  necessary  for  an  estimator  of  the  form  (2.7)  to  have 
smaller  MSE  than  the  least  squares  estimator.  Clearly,  the 
inequality  holds  for  p  >  3  if  X'X  is  a  constant  multiple  of 
the  identity  matrix.  In  this  case  and  only  in  this  case  the 

A 

ridge  estimator  is  a  multiple  of  ©. 

Now  we  compute  the  mean  squared  error  of  the  ridge  esti- 

2 

mator  0*,  given  by  (2.6).  Let  v  denote  a  non-central  chi- 

III  f  y 

square  random  variable  with  m-degrees  of  freedom  and  non-cen- 
trality  parameter  y.  Let  <$  be  an  integrable  function,  and  let 
T^  NC»1) •  It  is  easily  shown  that 


ET(p  (T2  )  =  CE*(  X?  r  2  ) 


(2.9) 


ET2*(T2)  =  E0 ( xf  ,2)  +  C2E 


o  >-  Z  '  ’  *-*  0  '  \  c  *- 


2)  • 


[2. io; 


Let  Z  =  (Z,,...,Z_)'  =  PX'Y,  where  P  is  given  by  (1.5). 


!'  '  P 

We  have  that  Z<?  N(Da»  a2D)  »  5 ' 9  =  IP_^  Z 2 /  .  2  and 
MSE  0  *  =  E ( 0  * —  9 )  ( 9*- g ) 

=  E  (  (D+K*I) -1Z-a)’  ((D+K*I)_iZ-  ,) 


P 


=  EE,  ,  (t 


Z  . 

l 


-  -tj 


i=l  '  3.  .  '2  ;,„p  r,  2  ,  2  v  '1 


i+ jo  /  (I  j  =  1Z I-/  •  ‘ ) 


=  EZP 


*VUi 


i=l  .  .  ,  .1*  .  .  P  ..2 


--  ) 


>1+  ( )  /  (  v 


u. 


(2.11) 


(10) 


where  IL  =  Z±/  ( /^c)  d  Nt/A^/a,  1),  so  that  U2  d  X^A.*2/a2  • 
Letv2  ~X2'W^  ~X2(i=l»---»  P)  independent  of  the  IK's  and  among 
themselves.  Using  (2.9)  and  (2.10)  in  (2.11)  we  get 


MSE  0*  =  EE?  - 
1=1 


X .  2 

l  a 


( ,  +  ,  vS2.  .  p  u2  V2  .,2 

(Xi  +  {-T2V(Sj  =  1^_  +  x  )) 

Aj  A. 


P  2 
+EEi=i  a2  [ 


>1 


(Xi  +  (^2)/(Ej  =  l  A^ 


Vt  +  Wf  2 

+  - L))2 


2  Xi 


Xj 


Xi  +  (^2>/^j=l  I7-> 


p 

:j= 


+  1  ] . 


(2.12) 


V 


It  is  not  possible  to  simplify  further  the  expression  for  MSE0*, 
except  for  some  special  cases.  Therefore,  we  shall  consider 
only  those  cases. 


For  large  values  of  t  we  have 
2  2  v.v  2 

MSE9*  ~  -2-  E[p(l - 

A  / 4 


.)  +  t  (  ^LX.n-£. 


(n_p' xp+2,t 


(n~p) Xp+2  f  T 


-)2] 


_  a  rw..2  ,-l.  tv  (n-p+2)  „  ,  2  ,  “2 

-  —  (p-2p^(xp+2,T)  +■ — - E(Xp+2_T)  1 

-  £  Ip.  E.vnp/2)  .t(p,  |  +  1;  i)e-T/2 

A  rtf  +  l) 

+  Tv2.(.n-p+2)r.(l:  11  „  ,P  1  P  +1;  ije-T/2, 

4  (n-p)  T(f+1) 

*  f2  (2p- 


where 


=  MSE  0  - 


va  v (n-p+2)  , 

'XT*  (2p - n-p -  ) 


<  MSE 0  for  v<2p(n-p)  /(n-p+2) 
$(a,b?  x  )  =  1  +  g  x  +  b  (b+1|  JT  +-  •  * 


(2.14) 


denotes  the  confluent  hypergeometric  function.  Since  0*  is 
of  the  form  (2.7),  an  application  of  Bock's  result  shows  that 
MSE  9*  <_  MSE  9  for  all  values  of  0  if 
v <2 ( p- 2 )  (n-p)  /(n-p+2) . 

Next  suppose  that  A*->-0  and  the  remaining  (p-1)  charac- 

terist  roots  are  bounded  away  from  zero.  Let  Aj  =  ^,G  ^(n-pJx^/Sx 

andG*  (n-D)x  \ /5y2  •  If  A*a'a-*-0  then  from  (2.12)  the  value 

9  n-p 

of  MSE 9*  is  approximated  by 

2  -  2 
Mopes*  ;  _2_  p  (  g  T+r  — — 

MbE.  ^  “  G-L  v"/3  i/j  A. 


+  a2[E( — — —  )  2-2E  ( — ^ - )  +  i ]  . 

]  G*+ v/5  G+v/3 


(2.15) 


(12) 


Hence 

X*  (MSE0-MSEe*)+c2(l-E(£^j/3)  2  )•  (2.16) 

We  have  shown  the  following  result. 

Theorem  2.6.  If  the  X.'s  are  equal  to  X,  say,  then 

^  4 

Lim  (MSE9-MSE9*) 9 ' 6  =  (2p-V-n~p+2)) 

0'0-+cc  X2  n"p 

and  MSE0 *  <_  MSE0  for  v  <_  2(p-2)  (n-p) /(n-p+2)  for  all  values  of  9. 

If  X#-»-0  but  the  other  p-1  values  of  the  XVs  are  bounded  away 

from  zero,  and  X*a'a->-0  then 

Lim  X*  (MSE0-MSE0*)  =  o2  (1-E  (-J-r)  2)  • 

X**0  G+v/3 


Suppose  that  X.=X*  for  r  values  of  i  and  the  other 
values  of  the  XV  s  are  bounded  away  from  zero.  The  following 
theorem  shows  that  MSE0*<  MSE9  for  sufficiently  small  values 
of  X*.  The  proof  is  based  on  certain  results  given  in  the 
Appendix. 

Theorem  2.7.  If  o<v<l  and  r>4+v (n-p+2) / (n-p)  then  for 
X*  sufficiently  small  MSE0*<MSE0  for  all  values  of  0. 

MSE  9  is  bounded  for  any  value  of  9  as  X*+o  but  MSEe*-*^. 

An  alternative  estimator  for  which  the  MSE  is  bounded,  is  given  by 
0**  =  ( X 1 X+K*  *1 ) ~  x ' Y 


Where  K**  =  <j/0'9.  The  mean  squared  error  of  0**  is  given  by 

X.  a2 

MSE 9**  =  El?  ,  - ^ - = - , - 

1-1  (X.+  (S2/a2)/  p  XjUj  t  A.y?  ,%2 


"j  =  l  (  X  j+K)  2  (Xi  +  K)2 


-)  ) 


+  EZ?  ,a2[ 

1=1  l 


X  .u2 


(X.  +  (S2/.a2)/ (Zp  .  -i-i  ,  + 

3 


xi (V2+W2) 


2X  . 

l 


,  ,  2  ^U2  X.V? 

.\.  +  (o2/o2)/(rj=1-U-iF)2  + 


r  *  (X^+K) “  (Xi+K) 
-+  •!-]- 


(Xi+K) 


corresponding  to  (2.12)  for  MSE 9 * . 


(2.17) 


APPENDIX 


Proof  of  Theorem  2.7.  Let  X^=\2~- •  •  =  \r=\it  and  let  \ ^ 
be  bounded  away  from  zero  for  i>r.  We  consider  the  limiting 
value  of  MSE  0*  as  A* -*■().  Let  denote  the  quantity  inside 
the  square  bracket  in  (2.12).  Clearly,  0<B^<2.  Also,  B^O 

as  A*->-0  for  i>r.  Therefore,  the  second  summation  in  (2. 12), given  a'a 

.  2 
is  maximized  m  the  limiting  case  as  A*  +  0  by  putting  a^=0 

for  i=r+l,...,p.  Similarly,  the  first  summation  in  (2.12) 

is  minimized  in  the  limiting  case  as  A*->-.0  by  the  same  sub- 

2 

stitution.  Therefore,  we  let  a^= 0  for  i=r+l,...,p. 

From  (2.12)  the  value  of  MSE9*  as  A *->-0  is  approximated 


by 


MSE  0 


*  * 


rq 

A„ 


E(1+(^j>/x]+2<x  0,a/0 


2>*2+°2!:i=r+l- 


Xi 


+ 


-2 

a'  aE  [  ( 1+  (^)/X 


2 

r+4 , A*  a' a/a 


2)-2  (l+(^2)/X 


2 

r+2  ,  A*  a'  a/  J 


2 


(3.1  ) 


2  2 
where  the  x  random  variables  are  distributed  independent  of  3  . 

Let<5  =A*a'a/a^,V  an<^  let  ^  denote  the  quantity  in¬ 

side  the  square  bracket  in  (3.1).  We  have 


E  Q  =  E[  (1- 


-2/  2 

vo  /.j 


V+*r+2,6+y52/°2 


.  -2,2 

) 2— 2 ( 1—  va  /o 


Xr+2,6+v52/°2 


)+l] 


=  E[ 


-2  ,  2 

2  (vgVa^)V 


(V*x2**,«*vSW>  +  (V+X'r+2,5  +  “«2/-32»2 

.  p  r  2  (vq2/q2)r  (V)  (vq2/q2)  2 _  , 

-  ,  2  ,  ~2,  2,2  +  ,  2  ,.2,2,2  J 

A r+2 , 5+va  /o  ]  (Xr+2,6  v°  /j  5 

=  E(v32/a2) (4+vq2/ q2 )  <Xr+2 , 6+vS2/c2) "2 


,  .2  ,  2.2 

( vc  / j  ) 


(14) 


-  *  (4+“Xn-p+2  /(n-P»>  <4*2,  i^4-p*2  /<n-P>  I  ‘2 

<  vE(4+vx2_p+2  /!n-pi  )E(  <^2i  .+  J,^_p+2/(n-p))'2 

-  “(4+'jSH?r>E(Xr«.5+^-p+2/<n-P”'2-  ,3.2, 

For  the  first  term  in  (3.1)  we  have 

EU+^i/  4*2, S]'2  i  i-EIvS2/^)  (X2t2  j+v^/P2)-1 

a 

-  1-UE<4+2,4+“Xn-p+2/ln-P))'23  3) 

Combining  (3.2)  and  (3.3),  we  get  an  upper  bound  on  the  right 
hand  side  of  (3.1),  given  by 

MSE0+Eai[6(4+u2ZEii)E(;<2+2iS+vX2_p+2/,n.pll-2 

* 

rE(xr+2,<s+vXn-p+2/(n-p)  )  (3.4) 

The  fifth  line  in  (3.2)  and  the  second  line  in  (3.3)  is  ob- 

2  2  2 

tained  from  the  relation  E  X  <MX  )  =  m<t>(X  .~)  for  any  in- 

m  m  m+  2 

tegrable  function  <t>. 

Let  R  denote  the  quantity  inside  the  square  bracket 

in  (3.4).  Clearly,  R<o  for  sufficiently  small  values  of  <5  . 

On  the  other  hand,  if  6 then 

S  *  (44  ^-P+21  -r)  /  6 
n-p  ' 

<  o  for  r>4+^(-ri~p~2)  . 

n-p 


If  j=o  then 


-2 


R  "  4  «  E  (xr+2,6>  -rE<Xr+2,5) 


-1 


=  [ 


=  [ 


or  '2 


(I  -  l) 


*<;  -  i,f  +  i4)-^£,£+i;^)]e-5/2 

r  (|  +  i)  2  2  2  22  2 

4^  A ,r  ,  r.,.5,  * ,r  r,,.6)1_-6/2 
rTr -"2 )  *  2  e 

(3.5) 


1 


(15) 


Using  the  recurrence  relation 

b<J>(a,b;x)-b<Ma-l,b;x)  =  x$(a,b+l;x) 
and  the  integral  representation  formula 

r(brH)I''("')  $(a'b;x)=/oeXtta'1(l-t)b‘a"1dt,  b>a>o 
it  can  be  shown  that  the  value  of  R,  given  by  (3.5)  is  ne¬ 
gative  for  r>4. 

If  v=n-p  then 

R  -  4<"-P+61E(X^.p+r+4>d)-2-r 

r  6  (n-p+6 )  T  (  (n-p+r)  /2)  .  ,n-p+r  n-p+r+4  .  6, 

"  1  4 r ( (n-p+r+4 ) 7 2 )  y  2  '  2  '  V 

rT  (  (n-p+r+2 )  /2 )  *  ,n-p+r+2  .  n-p+r+4  ,  -<5/2 

"  2F  (  (n-p+r+4) /2)  1  2  2  '  2ne 

(3.6) 


As  for  (3.5)  it  can  be  shown  that  (3.6)  is  negative  for 
r> (n-p+6) . 

The  above  result  suggests  that  the  value  of  R  is  nega¬ 
tive  for  all  6,  and  therefore  MSE8*<MSE0  if  r>4+ J  ■■n"P+^  ^  . 

n-p 

This  result  is  connistent  with  the  numerical  values  of  R  which 
have  been  computed  for  6=i(i)5, 10, 15v=  2(. 2)1.0,  n-p  =5(5)25 
and  r=  5  a)  10. 


(16) 
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